Distribution Functions

Preface

Open Rstudio to do the practicals. Note that tasks with * are optional.

R version 4.2.1 (2022-06-23 ucrt)

Normal distribution

Task 1

Define a set of approximately 100 equally spaced values between -5 and +5.
Plot the standard normal density function (mean = 0, sd = 1).

Use the function seq(…) to generate a sequence. Use the function dnorm(…) for the normal density function.

Solution 1

x <- seq(from = -5, to = 5, by = 0.1)
y <- dnorm(x)
plot(x, y, type = "l", col = "red", lwd = 2, 
     main = "Standard normal density function")

Task 2

Define a set of approximately 100 equally spaced values between -10 and +10.
Plot a normal density function of mean 2 and standard deviation 1.

Use the function dnorm(…) for the normal density function. Check the arguments mean and sd.

Solution 2

x <- seq(from = -10, to = 10, by = 0.2)
y <- dnorm(x, mean = 2, sd = 1)
plot(x, y, type = "l", col = "red", lwd = 2, 
     main = "Standard normal density function")

Task 3

Define a set of 100 equally spaced values between -10 and +10.
Calculate the probability to observe a value smaller or equal than 3, in a normal distribution of mean = 2 and sd = 1.
Calculate the 50th quantile of the standard normal distribution.
Calculate the 95th quantile of the standard normal distribution.

Use the function pnorm(…) for the normal cumulative distribution function (cdf). Use the function qnorm(…) for the inverse of the cdf.

Solution 3

x <- seq(from = -10, to = 10, by = 0.2)
pnorm(q = 3, mean = 2, sd = 1)

## [1] 0.8413447

qnorm(0.5)

## [1] 0

qnorm(0.95)

## [1] 1.644854

Task 4*

Define a set of approximately 100 equally spaced values between -10 and +10.
Calculate the probability to observe a value larger than 3, in a normal distribution of mean = 2 and sd = 1.

Use the function pnorm(…) for the normal cumulative distribution function (cdf).

Solution 4*

x <- seq(from = -10, to = 10, by = 0.2)
pnorm(q = 3, mean = 2, sd = 1, lower.tail = FALSE)

## [1] 0.1586553

Task 5*

Define a set of approximately 100 equally spaced values between -10 and +10.
What is the probability to observe a value between 2 and 4 in a normal distribution of mean = 2 and sd = 1 ?

Use the function pnorm(…) for the normal cumulative distribution function (cdf). Figure C = Figure A - Figure B:

Solution 5*

x <- seq(from = -10, to = 10, by = 0.2)
pnorm(q = 4, mean = 2, sd = 1) - pnorm(2, mean = 2, sd = 1)

## [1] 0.4772499

\(t\) distribution

Task 1

Generate 50 random values from the t-distribution. Recall that the number of degrees of freedom for a t-distribution is equal to the sample size minus one.
Plot the above created values as a histogram

Use the function rt(…) to generate random values from the t-distribution.

Solution 1

x <- rt(n = 50, df = 50 - 1)
hist(x)

Task 2

Construct a t-distribution with df = 6 and calculate the probability density function at t = -6, −4, −2, 0, 2, 4, 6.
Calculate the area under the t-curve for the intervals (-inf, -2] and (2, inf) for a random variable following a t-distribution with df = 6.

Use the function dt(…) for the probability density function from the t-distribution. Use the function pt(…) for the cumulative distribution function.

Solution 2

x <- c(6, -4, -2, 0, 2, 4, 6)
dt(x, df = 6)

## [1] 0.0004217475 0.0040545779 0.0640361226 0.3827327723 0.0640361226 0.0040545779 0.0004217475

pt(q = -2, df = 6)

## [1] 0.04621316

pt(q = 2, df = 6, lower.tail = FALSE)

## [1] 0.04621316

Task 3*

Generate 60 random values from the t-distribution and present them as a histogram. Recall that the number of degrees of freedom for a t-distribution is equal to the sample size minus one.
Calculate the 90th quantile of the \(t\)-distribution of the above sample.
Calculate the 95th quantile of the \(t\)-distribution of the above sample.
Increase the sample size to 150 and repeat all above steps.

Use the function qt(…) for the inverse of the cumulative distribution function.

Solution 3*

x <- rt(n = 60, df = 60 - 1)
hist(x)

qt(p = 0.90, df = 60 - 1)

## [1] 1.296066

qt(p = 0.95, df = 60 - 1)

## [1] 1.671093

x <- rt(n = 150, df = 150 - 1)
hist(x)

qt(p = 0.90, df = 150 - 1)

## [1] 1.287259

qt(p = 0.95, df = 150 - 1)

## [1] 1.655145

\(\chi^2\)-distribution

Task 1

Generate 100 random values from the \(\chi^2\)-distribution with df = 7.
Plot a histogram and compare it with the probability density function of the \(\chi^2\)–distribution with df = 7.

Use the function rchisq(…) to generate random values from the chi-square distribution. Use the function dchisq(…) for the density function.

Solution 1

x <- rchisq(n = 10000, df = 7)
hist(x, freq = FALSE)
curve(expr = dchisq(x, df = 7), col = "red", lwd = 2, add = T)

Task 2*

Generate a sequence of probabilities (ranging from 0 to 1).
Produce a quantile function plot of the \(\chi^2\) distribution with 2 degrees of freedom.

Use the function qchisq(…) for the chi-square quantile function.

Solution 2*

x <- seq(from = 0, to = 1, by = 0.1)
plot(qchisq(x, df = 2))

\(F\)-distribution

Task 1

Generate a sequence ranging from 1 to 20.
Evaluate the densities for four different \(F\)-distributions with degrees of freedom (df1, df2): (3, 1), (3, 3), (3, 6) and (6, 6).
Plot these densities, using different colors for each.
Calculate the area under the curve for the interval [0,1.5] of a \(F\)-curve with df1 = 10 and df2 = 20.

Use the function df(…) for the density function. Use the function pf(…) to calculate the area under the F-curve.

Solution 1

x <- seq(from = 1, to = 20)
y1 <- df(x, df1 = 3, df2 = 1)
y2 <- df(x, df1 = 3, df2 = 3)
y3 <- df(x, df1 = 3, df2 = 6)
y4 <- df(x, df1 = 6, df2 = 6)

plot(x, y1, type = "l")
lines(x, y2, col = "red")
lines(x, y3, col = "blue")
lines(x, y4, col = "purple")

pf(q = 1.5, df = 10, df2 = 20, lower.tail = TRUE)

## [1] 0.7890535

Binomial distribution

Task 1

We have 10 multiple choice questions in the exam. Each question has 6 possible answers, but only one of them is correct.

What is the probability of having 3 correct answers?
What is the probability of having 3 or less correct answers?

Use the function dbinom(…) for the density function. Use the function pbinom(…) for the cumulative distribution function.

Solution 1

# Since only one out of 6 possible answers is correct, 
# the probability of answering a question correctly is 1/6 = 0.17
dbinom(x = 3, size = 10, prob = 0.17)

## [1] 0.1599833

# For the probability of having 3 or less correct answers we 
# use dbinom for x = 0,..., 3.
p1 <- dbinom(x = 0, size = 10, prob = 0.17)
p2 <- dbinom(x = 1, size = 10, prob = 0.17)
p3 <- dbinom(x = 2, size = 10, prob = 0.17)
p4 <- dbinom(x = 3, size = 10, prob = 0.17)

p1 + p2 + p3 + p4

## [1] 0.9258528

# Alternatively
pbinom(q = 3, size = 10, prob = 0.17)

## [1] 0.9258528

Task 2*

We have 10 multiple choice questions in the exam. Each question has 4 possible answers, but only one of them is correct.

What is the probability of having 4 correct answers?
What is the probability of having 4 or more correct answers?

Use the function dbinom(…) for the density function. Use the function pbinom(…) for the cumulative distribution function.

Solution 2*

# Since only one out of 6 possible answers is correct, 
# the probability of answering a question correctly is 1/4 = 0.25
dbinom(x = 4, size = 10, prob = 0.25)

## [1] 0.145998

# For the probability of having four or more correct answers 
# we use dbinom for x = 4,..., 10.
p1 <- dbinom(x = 4, size = 10, prob = 0.25)
p2 <- dbinom(x = 5, size = 10, prob = 0.25)
p3 <- dbinom(x = 6, size = 10, prob = 0.25)
p4 <- dbinom(x = 7, size = 10, prob = 0.25)
p5 <- dbinom(x = 8, size = 10, prob = 0.25)
p6 <- dbinom(x = 9, size = 10, prob = 0.25)
p7 <- dbinom(x = 10, size = 10, prob = 0.25)

p1 + p2 + p3 + p4 + p5 + p6 + p7

## [1] 0.2241249

# Alternatively
# Note: if lower.tail = TRUE (default), probabilities are Pr(X ≤ x), otherwise, Pr(X > x).
pbinom(q = 3, size = 10, prob = 0.25, lower.tail = FALSE)

## [1] 0.2241249

# Pr(X > x) does not include x therefore we use x-1 (4-1).